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ONE Of ,hc main goab of tbcmi^rZIZI~~. "^^'^^''^^^n^^ 



One of the mam goab of the Human 
Genome Project is to sequence in an 
mternattonal cooperative venture, tie S 
biUran or so base pain of DnA that 

each chromosome (the sequence 
ma,»)wjB allow the iientificaUS 
the 80,000 or so human genes and 

Prwide a framework for%tudjSg 
t,«=««nJ?NA variations aS| 
humans predispose towards varioul 

^Sf^ ]I*P';^«« was initiated S 
1990 by the US government (the 
National Institutes of Health and the 

jomed by the United Kingdom 
France. Germany and Japan lT5 

matedcostisSSbilhonoSrisyeS: 
The fust five years have focused 
on genetic and physical mapping? 
TJe genetic map is a represeitetiln 

of the order of the genes along hJ 
chromosomes; the physical map fa a 
coUection of identifiable overiaSpSJ 
fragments of DNA. together 
spcofication of how they ar? 
arranged along the chromosom^es. 

we are now entering the more 
complex sequencing phL". Su 
mg approaches to sequencine the 
human genome are bised Jf j! 
assumption that each reg,n Jo be 
^quenced must fim h! mapjed 
I?„ *is step fe diJ 

pensable. Tie alternative afpro^ch 

£^te:s-^^ 

reeiom «.rk , chromosomal 
regions such as gene families (encodine 
torexamo e. neural J ""~"ng. 



fuTf ! *• appropriate hosts 
such as bacteria or yeast (see the table 
(A clone here comprises a vector Sth a 
single mserted fragment of humai DNA. 
^ library compSLl^ 



&S" ''^''f ^ assembled computa- 
Jaselte «q"ence of the 4oSo- 
base (kb) cosmid insert. This random or 

oegree of accuracy because everv 
§1 f^deotide is sequenced ab^ut S 
I times (400 bases per clone xffi 
I dones = 320.000 bllses of sequeniy 
s Most genome-wide or chrotnosome- 

I ^^gjoflowresolutionandbasedon 

The approach to genome-wido 
sequencing presents LlrS S^U 
tenges and has various limi tetS . 
Rrst. mitial attempts at high-resolu- 
hon mappmg of hmnan chramoSme" 
16. 19 and 22 have been expensive 

IS ttat It is difficult to obtain com- 
P rte maps without any gaps. Com- 
pteting sequence-rea^fy^m'Jps for 
th«« and the other huian chromo- 
somes remams a forbidabletasL 

second, some 50 per cent of YAC 
clones show structural instability of 
nserts, resulting in deletion „ 

Cloned DNA, or are chimaeras in 
which two or more DNA fragment; 
Sone C •"~'P°"t«=d into one 
Clone These defective YACs are 

TOly unsuitable for use iTmaJ! 

^ .A ' effort is required to 
emire collection of human DNA liJiT"^ i^rts some- 

fragments each inserted into a verVn, ^ T*^" aberrations Zt 

molecule.) «=a mto a vector are sunilarly difBculi to detert 

chi^''**°''"^°" P'^y""' raaps of each tanS^V,?-* ''T^" «>ntaimi 
chromosome are first prepared by idem? 1" ("djacew) arrays of DNA units 

sS '"-^"^ ? S ^•fflfri'y (for exam! 

Sites that can be amof if.v,i k,, o?.,; P'^'. "^^ tandem 21-kb arrays) and 




' ^^mm^ 9f^B€t ft 



entering the 



see t^ff ^y^^^d-^onquer stotegj 
1" ™ *«e different clone libraries 
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sitef .hr» „ '""™«rKs (such as unique 

sites that can be amplified bv PCR ia^M^'T ^^-"o airays) and 

as gene families (encodin» <*«qV<^°ce-tagged sites or STSs) or ^^y^** g^n^nie-widc repeats 

for example, neural and o fectS'SSi' ^^"j'ction-enzyme digestion site on n^i!!^7^^' '^V'^^ ^'^^ ^^t^r^S 

simpler organisms. *'^^> clones. Hi(rh.rM«i,...„- , '""onmappmgwhenthesizeaf 

the clone insert is less than that of the 

simJar. such as a 40-kb cosmid inJn 
against a 105-kb DNA array " 
Fourth, the conventi nal sequencins 
ocedure is comnlPir 



(VAC) clones. High-resolution or 
"Sr?"""'^ "naps are then made il 
randomly cutting and subcloning VaC 

ov naps. Next, a nunimallv n«.,i-__:-. 



— .»»,.cu uy laentuying their landmart c .. . array, 
ov riaps. Next, a minSly SanS o^"''' ""^ sequencing 

H-v-v-. mapping, collaboration among 
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e MMENTARY 



Cosmid 




Sequence-togged connectors 



both ends 
ifinierprtnteach SAC 



Pla»nid --:r±:SiS?:=:^^ Random or 



orM13 

urn 



I Sequence 800 cionei 



Shotgun 



M13 or pissmid library 



Sequence 3,000 clones 
r and! 



1 Choose minimum ccwnld overlap 
Sequence cosmlde In both directions 



Rnd 30 overt wplng 
BACs from STCs 



Cosmids- 



^M^osmld walking 




Sequence BACs wttb minimal 
cvatap at each end and repeat 



STCs would be made immedi- 
ately available electronically 
on the World Wide Web. 

Next, each BAC clone is 
fingerprinted using one 
restnctton enzyme t provide 
the insert size and detect arte- 
factual clones by comparing 
the fingerprints with those of 
overlapping clones. A seed 
BAC of intent is sequenced 
by any method and checked 
against the database of STCs 
to idbntily the 30 or so over- 
lapping BAC clones. The two 
BAC clones showing internal 
consistency among the finger- 
prints and minimal overlap 
at either end are then 
sequenced. In this way, the 
entire human genome could 
be sequenced with just over 
20,000 BAC clones (see the 
table). 

Our approach lias several 



^ BAC walking 



»^-^ BACs 
STCs 



The conventional sequencing approach and the newly proposed sequence-tagged connectors (STC) 
approach. The bacterial artfficial chromosome (BAC) clones In tiie STC approach could be sequenced 
by any cost-effective sliategy. 



large and small groups is dtfficuh. 

These problems, and two important 
scientific advances, lead us to propose a 
new strategy for cooperative sequencing 
of the human genome. The first advance 
is that we can now sequence and assem- 
ble prokaiyote genomes of up to several 
megabases (Mb) with high accuracy and 
fidelity^'. The second is the development 
of bacterial artificial chromosome (BAC) 
libraries that can accept human inserts of 
up to 350 kb. BAC clones seem to repre- 
sent human DNA far more faithfiiily 
than their YAC or cosmid counterparts'". 
For example, the l*Mb locus encoding 
the human a6 T<ell receptor has been 
mapped using only 17 BAC clones, in 
contrast to the 75 or so cosmid clones 
that would have been required for the 
same coverage. Detailed landmark analy- 
ses show that only one of 17 BAC clones 
had a defect, a small 6-kb deletion (C. 
Boysen, personal communication). 

What is more, BAC clones are excel- 
lent substrates for shotgun sequence 
analysis. For example, five out of five 
BAC clones, ranging in size from 89 to 
210 kb, have been sequenced using this 
approach (C. Boysen, personal commu- 
nication); other laboratories have had 
similar success rates. So BAC clones 
seem ideal for producing an accurate 
contiguous sequence. 

Our new approach to genomic 
sequencing eliminates the need for any 
prior physical mapping and uses BAC 
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clones as the basic sequencing reagent 
(see the figure). A human BAC library 
with an average insert size of 150 kb and 
about 15-fold coverage of the human 
genome contams 300,000 clones. These 
are arrayed into microtitre wells. Both 
ends (starting at the vector-insert points) 
of each BAC clone are then sequenced to 
generate 500 bases from each end. The 
600,000 BAC end sequences are scat- 
tered roughly every 5 kb across the 
genome and make up 10 per cent of the 
genome sequence. We denote them 
'sequence-tagged connectors', or STCs, 
because they allow any one BAC clone to 
be connected to about 30 others (for 
example, a 150-kb insert 'divided' by 5 kb 
will be represented in 30 BACs). The 



■ The cost and effort of 
obtaining complete low- and 
high-resolution maps are virtu- 
al^ eliminated, so the fiont- 
end automation is greatly 
simplified (including clone 
arraying, DNA purification, 
fingerprinting and sequence 
reactions). 

■ The BAC clones can be made readily 
available to sequencing groups worldwide 
through resource centres and oonunerctal 
distributors. Large centres could sequence 
many different BAC clon^ forming major 
contiguous regions of DNA while snoall 
groups could contribute one or a few 
BAC sequences* 

■ As improved techniques for genera- 
ting BAC or other yet-to-be-developed 
libraries appear, reasonable numbers of 
these new clones could easily be added to 
the clone collection. 

■ It is likely that our approach will 
eliminate the problem of generating 
complete maps without gaps for high- 
resolution physical mapping. 



CUDNE UBRARIES USED FOR GENOME MAPPING 



VWilln 


HumsnDNA hwert 




Yeast artificial 


100-2,000 kb 


3,000 (ijbi^iijf 


chromosome 






(VAQ 






Bacterial artificial 


80-350 kb 


20.000(150 kb) 


chromosome 






(BAC) 






Cosmid 


30-45 kb 


75.000 (40 kb) 


Plasmid 


3-10 kb 


600,000 (5 kb) 


M13 phage 


Ikb 


3,000,000 (Ikb) 
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COMMENTARY 



Sf«.5 (PCR-specific sites) and 
expressed sequence tags or ESli roartial 
complementaiy DNA ^qutn«s) STS 
eaafy phced on the BA? cSs 'aX 
filrHi "f"''" BAC clones id 

obta]^dL\r/s?S^;fe^SS 

^^•|^Krrt-?sS^^^ 

2^»«S;!S^°"*^p-«^«'»«chro- 

■ aromosomal regions of key biologi- 
«I interest can be sequenced fim. ^ 

■ TTie human genome can be 

■ iiie STC approach will provide use- 
.tetr/TC?°»'=^.^'"5iese;en1t 

achievS « 

■ TTiis would be an efficient strategy for 
sequencing genomes of smaller oSm 
sms such as single-cell eukamte° S 

as the model orgamsms that the eenomp 

K "k""*"'^ """"'"^d to sf 
ing: the bacterium 7. . 




to «w* on tl» c*mn»«m. wghn^; 



rrE?°'^^'='^"«''^»"''Wetothe 



"enh«K»hrt«„atlon.leoopSr 



Vk " committed to sequenc teamr-l ,,i;r*"*r"'«- Several research 
ing: the bacterium Eschericia cofi the It therefore work on the same 

,iarly, any DNA sequencing 




I chemistries could be used" 
| mdudmg those not ye. cS 

J The complete set of BAC end 
r K!'~^»'«» fingerprints J 

usmg. for example, 30 AoDlied 
Bmsystems 377 sequencer a f 

fsirf"'?-'""'"'-^^ 

a small fraction of the cost of 
Mquence-ready physical mapping 
that has yet to be incurred. 

A highly cooperative combina- 

I„H c «*"°'ne centres 

and small groups could finish 
sequencing the entire huma! 
genome m under ten years The 
wrrent cost of DNA sequencina 
: 50-30 per finisLd SS 

paff m the most efficient labora- 
tones, and it is expected that it 

mn,^to$0.,J25p'e;ts: 
pair in the next one to three 
years. At these rates, the total 
sequencing cost for the eS 
genome would be less than the 
genome funds spent so fan T?e 
WeHcome Trust in ,he UnSd 
tW. " announced 

Cemri r ^'"''"S Sanger 
Centre to sequence a sixth or 



TJ ""^ """"an genome 

genome sequencing centres for „iS 

scaie-up Studies"-'* Thor* 1 *^ ^ 

strate™ f«, » "^""^^ an improved 
Src Zl?! ww'dination. The 

iTC strategy proposed here offers a 

effort. '''^-'*S"'»"ng. open scientific 

nroiecwar Biotechnology. University of 
Washington, Box 357730 SeattJ. w^Jt 
ington 98195.7730 ^' 
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